13 research outputs found
Improving 6D Pose Estimation of Objects in Clutter via Physics-aware Monte Carlo Tree Search
This work proposes a process for efficiently searching over combinations of
individual object 6D pose hypotheses in cluttered scenes, especially in cases
involving occlusions and objects resting on each other. The initial set of
candidate object poses is generated from state-of-the-art object detection and
global point cloud registration techniques. The best-scored pose per object by
using these techniques may not be accurate due to overlaps and occlusions.
Nevertheless, experimental indications provided in this work show that object
poses with lower ranks may be closer to the real poses than ones with high
ranks according to registration techniques. This motivates a global
optimization process for improving these poses by taking into account
scene-level physical interactions between objects. It also implies that the
Cartesian product of candidate poses for interacting objects must be searched
so as to identify the best scene-level hypothesis. To perform the search
efficiently, the candidate poses for each object are clustered so as to reduce
their number but still keep a sufficient diversity. Then, searching over the
combinations of candidate object poses is performed through a Monte Carlo Tree
Search (MCTS) process that uses the similarity between the observed depth image
of the scene and a rendering of the scene given the hypothesized pose as a
score that guides the search procedure. MCTS handles in a principled way the
tradeoff between fine-tuning the most promising poses and exploring new ones,
by using the Upper Confidence Bound (UCB) technique. Experimental results
indicate that this process is able to quickly identify in cluttered scenes
physically-consistent object poses that are significantly closer to ground
truth compared to poses found by point cloud registration methods.Comment: 8 pages, 4 figure
That and There: Judging the Intent of Pointing Actions with Robotic Arms
Collaborative robotics requires effective communication between a robot and a
human partner. This work proposes a set of interpretive principles for how a
robotic arm can use pointing actions to communicate task information to people
by extending existing models from the related literature. These principles are
evaluated through studies where English-speaking human subjects view animations
of simulated robots instructing pick-and-place tasks. The evaluation
distinguishes two classes of pointing actions that arise in pick-and-place
tasks: referential pointing (identifying objects) and locating pointing
(identifying locations). The study indicates that human subjects show greater
flexibility in interpreting the intent of referential pointing compared to
locating pointing, which needs to be more deliberate. The results also
demonstrate the effects of variation in the environment and task context on the
interpretation of pointing. Our corpus, experiments and design principles
advance models of context, common sense reasoning and communication in embodied
communication.Comment: Accepted to AAAI 2020, New York Cit
ARMBench: An Object-centric Benchmark Dataset for Robotic Manipulation
This paper introduces Amazon Robotic Manipulation Benchmark (ARMBench), a
large-scale, object-centric benchmark dataset for robotic manipulation in the
context of a warehouse. Automation of operations in modern warehouses requires
a robotic manipulator to deal with a wide variety of objects, unstructured
storage, and dynamically changing inventory. Such settings pose challenges in
perceiving the identity, physical characteristics, and state of objects during
manipulation. Existing datasets for robotic manipulation consider a limited set
of objects or utilize 3D models to generate synthetic scenes with limitation in
capturing the variety of object properties, clutter, and interactions. We
present a large-scale dataset collected in an Amazon warehouse using a robotic
manipulator performing object singulation from containers with heterogeneous
contents. ARMBench contains images, videos, and metadata that corresponds to
235K+ pick-and-place activities on 190K+ unique objects. The data is captured
at different stages of manipulation, i.e., pre-pick, during transfer, and after
placement. Benchmark tasks are proposed by virtue of high-quality annotations
and baseline performance evaluation are presented on three visual perception
challenges, namely 1) object segmentation in clutter, 2) object identification,
and 3) defect detection. ARMBench can be accessed at http://armbench.comComment: To appear at the IEEE Conference on Robotics and Automation (ICRA),
202